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Abstract 

Conspiracy beliefs are common, and can cause harm to individuals and their 
communities. Our aim was to examine the social and linguistic characteristics of forum 
users who became active participants in a large online conspiracy forum. The study is a 
retrospective case-control study using a large dataset from the online forum network 
Reddit, comparing users who would go on to post comments in a conspiracy forum with 
a group of control users. The analyses show that prior to posting in conspiracy forums, 
these users consistently exhibited anger and used third person pronouns 
disproportionately more often than the control group. A community structure analysis 
of these users revealed substantial heterogeneity—users from different communities 
varied in both linguistic characteristics and in the topics of the forums in which they 
were involved. The results suggest that a desire to belong to an in-group and 
mechanisms for reinforcing opinions through peer feedback appeared to create social 
spaces in which conspiracy beliefs were normalised, and these social spaces were not 
specific to ideologies and interests traditionally associated with conspiracy belief. 


Introduction 


Conspiracy theories—beliefs attributing agency over important world events to the 
secret plotting of powerful, malevolent groups—have been remarkably popular over a 
sustained period of time [7[|8[ [22]|45[|58] . Conspiracy belief can cause harm to the 
individual and to the societies in which they exist; they are associated with lowered 
intention to participate in social and political causes [28], unwillingness to follow 
authoritative medical advice, increased willingness to seek alternative medicine 
and a tendency to reject important scientific findings [31 48 


27 39 


Studies in the area have focused on identifying the individual psychological factors 
that are exhibited by, or predispose people to, conspiracy beliefs. These studies suggest 
that conspiracy belief does not correspond to a specific pathology, and is instead 
represented by a diverse range of characteristics [4,28,52 . Early work pointed to 
feelings of powerlessness: conspiracy theories offer the satisfaction of an explanation for 
events that otherwise seem beyond control 


25,34 . Conspiracy beliefs are also 


associated with higher levels of anomie and powerlessness [T], political cynicism and 
defiance of authority 52 , paranoid ideation, paranormal belief, and schizotypy [4,14 
and are negatively associated with agreeableness 52 . Those who already believe one 


conspiracy theory are also more likely to accept further conspiracy theories, which has 
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led to the hypothesis that those who hold conspiracy beliefs assimilate new conspiracies 


within their existing conspiratorial world-view 22151 53,60 


Relative to individual psychological characteristics, the social factors that are 
associated with conspiracy belief have received less attention [20]. Until recently, 
conspiracy belief has been challenging to observe in natural settings where it is also 
possible to observe social factors, but new sources of longitudinal social media data 
provide new opportunities to examine social phenomena at scale 24 . These data 


sources have advantages over traditional survey-based studies, including the size of the 
cohort and the ability to observe the public behaviour of people online without 
influencing the participants. Recent examples in related domains include studies that 
observe or simulate the spread of conspiracy belief and other forms of misinformation in 
social media settings 18,19,43]. These studies focus on the effect of network structure 


rather than individual differences in personality and psychology. Others have examined 
the language of social media users to predict behaviour changes relative to mental 
health conditions 


11,15 17 . To our knowledge, none have specifically addressed 


conspiracy belief. 

Reddit is a network of online forums (subreddits), and is an example of a natural 
setting for observing human behaviour at scale. Research using Reddit data has been 
relatively infrequent but has been used to study the structure of conversations and 
propagation of information 10 , to examine hateful and offensive speech 35,37 46 , and 


for a range of linguistic analyses 13,55 . The most closely related study analysed the 
language used on Reddit to examine how Reddit is used in social support of mental 
illness 16 . 

Reddit has a dedicated forum for discussing conspiracy theories: r/conspiracy. Our 
aim was to address the notion that there is a single dominant pathology for conspiracy 
belief, and that a certain set of psychological or ideological characteristics are risk 
factors that predispose individuals to the condition. To do this, we measured the social 
and linguistic factors that were associated with users’ future involvement in a 
conspiracy theory forum on Reddit. 


Methods 

The study design was a retrospective case-control study examining the language use and 
posting patterns of Reddit users. We examined the posting behaviour of r/conspiracy 
users in the period preceding the first time they posted in r/conspiracy, which allowed 
us to measure potential risk factors for becoming embedded in r/conspiracy. 

Dataset 

Reddit posts consist of an initial post followed by nested comments underneath. From a 
publicly available dataset, we examined 1.10 billion comments from 1,419,406 million 
users posted to 224,625 subreddits between October 2007 and May 2015. The dataset 
does not include the initial posts, only the nested sets of comments that follow the posts. 

We identified and removed comments by non-human users (bots) using a heuristic 
that measured the diversity of subreddits in which the users posted and combined this 
with a known list of bots (see Supporting Information). This process identified 466 bots 
in the set, and these were excluded from subsequent analyses. 

The r/conspiracy group was defined as the set of users posting at least 3 comments 
in r/conspiracy, and at least 4 times in each of the six contiguous 30-day periods 
immediately prior to their first post in r/conspiracy. Users who posted in r/conspiracy 
but did not meet both of these criteria were excluded from the analysis. All comments 
posted by included users made before their first post to r/conspiracy were included in 
the subsequent analysis, and these were used to characterize their language use and 
topics of interest. 























The control group was defined by the set of users who never posted comments in 69 

r/conspiracy and had posted at least 4 times in any six contiguous 30-day periods. All ?o 

comments by users in the control group were included in the subsequent analysis. 71 

Social factors 72 

To determine which of the subreddits might represent important pathways through 73 

which users travel to reach r/conspiracy, we determined the Bayes factor for each 74 

subreddit (see Supporting Information). The results allowed us to identify in which 75 
subreddits r/conspiracy users were over-represented or under-represented relative to the 76 
control group before they began posting in r/conspiracy. From this analysis we selected n 
the subreddits that were popular and had high Bayes factors for further analysis. 78 

We additionally examined the community structure of users who later became 79 

embedded r/conspiracy users by applying a community structure algorithm to the so 

network of user co-posting similarity. The network was a weighted and undirected si 

network in which the links between any pair of r/conspiracy users was given by the 82 

proportion of their comments in the same subreddits divided by the total number of ss 
comments posted by the pair (i.e. the Jaccard similarity) [30j. We then used a greedy 84 
modularity optimization method [ 6 ], which starts with all users in separate communities ss 
and then merges communities according to a gain in modularity—a measure of the se 

density of connections within versus between communities. The result is a hard 87 

clustering (each r/conspiracy group user is assigned to exactly one community), which ss 
we refer to as pathway communities. 89 


Linguistic Factors 


We used Linguistic Inquiry and Word Count (LIWC), a tool used to extract a set of 91 

linguistic characteristics from written text corresponding to emotion, personality, and 92 

language structure 54 . To measure these characteristics, we pre-processed and 93 

concatenated the text of comments posted by each of the users in the r/conspiracy and 94 
control groups (see Supporting Information), before applying the LIWC tool to produce 95 
values for a pre-determined set of 32 linguistic factors. 96 

To report differences in linguistic factors between the r/conspiracy and control 97 

groups, we calculated the Common Language Effect Size (CLES), which gives the gs 

probability that a randomly chosen member of the target group had a higher score than 99 

a randomly chosen member of the control group [33] . The CLES is an intuitive measure 100 
of effect size that is robust to outliers and oddly shaped distributions, and is 101 

interpretable as a Bayesian measure of effect size. To describe the importance of a 102 

linguistic factor, we used Good’s deciban 23 , which we used as a threshold for the 103 

minimum perceptible difference (see Supporting Information). 104 

To compare the linguistic factors for each of the pathway communities, we repeated 105 
the LIWC analysis and produced a CLES for each of the 32 linguistic factors and for ioe 
each community relative to the control. For each of the r/conspiracy communities, we 107 
also characterized them by the set of subreddits in which they were most ios 

over-represented by Bayes factor and by the number of r/conspiracy users within the 109 
subreddits of interest. no 


Results 111 

After removing bots, 15,850 users met the inclusion criteria for the r/conspiracy group, 112 
and 1,403,411 users met the inclusion criteria for the control group. m 
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Social factors 


114 


Of the 38,753 subreddits in which r/conspiracy users posted at least once prior to 115 

posting in r/conspiracy, 4,911 had at least 1000 users posting in the subreddit, and ne 

1,499 of these had a Bayes factor greater than 4 . These subreddits were environments in in 
which participation may have been a risk factor for later participation in the ns 

r/conspiracy subreddit. The median number of comments posted by r/conspiracy users 119 
in the 1,499 high-risk subreddits was 2,151 (IQR 1 , 010 - 4 , 682 ), compared to 310 (IQR 120 
161 - 702 ) by control group users (see Ancillary Dataset 1 for full details). 121 

A clear clustering of r/conspiracy users was revealed by the community structure 122 
analysis. The analysis identified a set of 12 communities with a median of 1158 123 

r/conspiracy users in each community (ranging from 180 to 3003 users). The modularity 124 
of the network clustering was 0.34 (see Figure [S 2 |. 125 


Linguistic factors i 26 

In 22 of the 32 linguistic factors we examined, the r/conspiracy users exhibited at least 127 
a 1 -deciban difference in CLES scores from the control. The largest differences between 12s 


the r/conspiracy group and the control group were identified for “anger”, “tone”, 129 

“negative emotion”, “power”, “reward”, and “they”. Other factors that might have been 130 
expected to exhibit differences, such as “anxiety”, “sad”, “you”, and “analytic” did not 131 
exhibit a clear difference relative to the control (Figure [IJ. 132 

Pathway communities varied in size, over-representation, and interests (Figure^. 133 
Some communities were linked by obvious content similarities, such as political or 134 

drug-related subreddits. Others appear to be connected by overall attitude or in social 135 
engagement, stemming from subreddits which reflect the unique and often 135 

self-referential culture of Reddit itself. The pathway communities were labelled by the 137 
common themes of the subreddits where the members of the community were most 13s 
over-represented (Ancillary Dataset 1 ). 139 

Relative to the r/conspiracy group as a whole, individual pathway communities uo 

differed considerably for some LIWC scores. For example, in “tone”, “negative ui 

emotion”, and “anger”, communities were consistently different from the control group, U2 
and in “sad”, the communities were consistently similar to the control group (Figure [3]). 143 

However, only four of the individual pathway communities exhibited substantially higher U 4 
levels of “anxiety” relative to the control group, and the r/conspiracy group as a whole us 
did not. Pronoun use also varied across the pathway communities. While the use of ue 

“they” pronouns was disproportionately high relative to the control group across most 147 

pathway communities, differences in the use of “we” type pronouns varied (Figure | 4 |. us 


Discussion 


The ability to observe the online behaviour of a large number of people over extended iso 
periods provides a unique view into the characteristics of people who become embedded 151 
in a community of users likely to express conspiracy beliefs, providing an empirical basis 152 
from which to examine risk factors. The results show that online forum users who 153 

become embedded in a community of conspiracy belief are heterogeneous, and do not 154 
always fit the expected profiles of people susceptible to conspiracy belief. Similarly, the 155 
results indicate that conspiracy belief is not specific to certain ideologies or sub-cultures, ue 
Social factors related to the desire to belong to an in-group appear to characterise many 157 
communities 20 , and these factors may be exacerbated by the mechanisms of voting us 
and replying in the subreddits. Together, the results suggest that the apparent 159 

mainstreaming of conspiracy belief in society may be related to the breadth and ieo 

diversity of pathways through which users may travel to become embedded in iei 

communities where conspiracy belief is normalised. 102 
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Spread of subgroups versus control 



Figure 1 . CLES Scores for the r/conspiracy group (orange triangles) and individual 
communities in the r/conspiracy group (green circles) where there is at least a 1-deciban 
difference in the distribution to the control, and illustrated in grey otherwise. 
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Figure 2. An overall picture of the pathway communities. Area is proportional to 
number of users in the pathway community; saturation is proportional to the highest 
subreddit-specific Bayes factor. A selection of popular subreddits and notable differences 
for each are shown (full details of subreddits most closely related to each pathway 
community are provided in Ancillary Dataset 1). 
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Figure 3. The distributions of emotional factors, including “anger”, “sad”, “anxiety”, 
and “negative emotion”, for the 1,403,411 users in the control group (grey), all 15,850 
users in the r/conspiracy group (orange), and between 180 and 3,003 users in the 12 
pathway communities (green). Distributions exhibiting a 1-deciban difference from the 
control are labelled according to the direction of the difference. 
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Figure 4. The distributions of pronoun use factors, including “I”, “we”, “you”, 
and “they”, for the 1,403,411 users in the control group (grey), all 15,850 users in 
the r/conspiracy group (orange), and between 180 and 3,003 users in the 12 pathway 
communities (green). Distributions exhibiting a 1-deciban difference from the control 
are labelled according to the direction of the difference. 
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Comparisons with existing literature 

The high scores for “anger” among users who went on to post in r/conspiracy are 
unsurprising, and correspond to what is known about the role of anger in both 
conspiracy theories and passing misinformation more broadly |56 . By contrast, we 
found no clear signals for “anxiety” (except in certain communities such as those 
focused on drugs) or “sadness”. This casts some doubt on hypotheses which link 
conspiracy belief to feelings of paranoia and alienation |1 1 14 41 , the need for cognitive 


closure [29], loss of control 49 57 , or existential doubt in the face of uncertain 
information [21,36 . In addition, we did not find increased first-person singular pronoun 
usage, which has been used as a marker for mental illness in other studies of online 
communities 17 . Each of these factors may be important for a subset of r/conspiracy 


users, but they did not characterize the population as a whole. 

The results suggest an alternative pathway into conspiracy belief based not in 
individual psychological factors but on social interactions facilitated by the structure of 
Reddit. The clear signal for “we” and the strong network clustering suggest the 
importance of social factors like in-group identification | 7][T2] ) 20]47 5 |59| . Homophily 
based on shared beliefs has been shown to be a powerful means by which network 
structure co-evolves with cultural diversification [2,9], an effect which extends to online 
communities [3]. The structure of Reddit, which contains individual forums on 
specialized topics and mechanisms for providing positive and negative social feedback, 
may help to facilitate the formation and maintenance of echo chambers. 

There was also substantial heterogeneity exhibited across the clusters of users who 
went on to become embedded in r/conspiracy. For example, while politics and political 
ideologies feature as topics in many of the subreddits in which r/conspiracy users are 
most over-represented, these spanned anarchism, socialism, conservatism, and 
progressive ideologies. Conspiracy theories appear to provide a common cultural 
touchstone for individuals with otherwise diverse interests, which, combined with the 
ease of forming online connections, may be a sufficient basis for leading users into a 
conspiracy belief culture. The relevant predisposing beliefs might thus be higher-order 
ones, such as a belief in the general untrustworthiness of the government or the 
existence of cover-ups [31 32 38 -40 59 60 . This presents something of a counterweight 


to the view that conspiracy communities are homogenous and self-sealing 50 


Conspiracy theories play an important role in constructing individual narratives 44 
which can both converge with and diverge from shared narratives. 
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Implications and future research 

Large datasets from online communities represent an important source of information 
that show how social interactions contribute to risk factors for conspiracy belief over 
and above individual predispositions. Compared to surveying individuals, Reddit 
comments are a noisy and indirect way of measuring these factors. Despite this 
limitation, studies examining large scale social media data sources represent a valuable 


complementary source of information about individuals and their motivations 24 


Further research is needed to clarify the relationship between the linguistic and social 
factors at both the group and subgroup level. Posting in the same subreddits is a simple 
measure of social interaction, but it would be possible to define more complex measures 
that take into account responding to comments and voting patterns. Of particular 
interest might be whether individuals are drawn to r/conspiracy by interaction with 
already entrenched individuals. Our research only looked at target posters before they 
entered into r/conspiracy; further longitudinal analysis might reveal how the distinctive 
patterns we discovered change as they engage with the broader conspiracy forum. Other 
linguistic analyses might examine temporal changes in the concentration and divergence 
of word use to identify the formation of echo chambers signalled by the specialisation of 
language within communities at risk of conspiracy belief. 
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Limitations 215 

Our study was subject to several limitations. The dataset tracks Reddit users rather 2 ie 
than individuals, and individuals can have multiple user accounts. To minimise this 217 

limitation, we limited the analysis to users posting with a minimum frequency over an 21 s 
extended period of time. Similarly, we only used information about users who were 219 

active participants in subreddits, and could not determine whether users were reading 220 
forums without commenting (lurking). We think it is reasonable to take commenting to 221 
indicate active participation. Differences in language are likely to be noisy proxies for 222 
psychological states. However, similar linguistic analyses have been used to study other 223 
psychological phenomena, and the robustness of our findings suggests that the results 224 
are a reasonable signal of differences in psychology. 225 

Conclusions 226 

Reddit users who would go on to eventually post in r/conspiracy exhibited behaviours 227 
that were consistently different from other users, including anger, negative emotion, and 22 s 
the use of third person pronouns. These results confirmed what has been reported in 229 
other studies focusing on the characteristics of people who hold conspiratorial beliefs. 230 
When we examined community-level differences within this group of users, we found 231 
that there was substantial variation in both the linguistic differences they exhibited as 232 
well as their ideologies and interests. We also discovered variability in the use of 233 

personal and group pronouns, which suggests the importance of users’ sense of 234 

belonging within these forums. The results suggest that shared narratives constructed 235 
from misinformation may be reinforced by the clustering of users and the nature of 236 
social feedback in online forums and social media. 237 
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Supporting Information (SI) «» 

This supporting information document includes further details of the methods and 409 

results used in the manuscript: “Pathways to conspiracy: the social and linguistic 4io 

precursors of involvement in Reddit’s conspiracy theory forum”. The study design was a 4ii 
retrospective case-control study in which the outcome of interest was active 412 

participation in the r/conspiracy subreddit. We examined linguistic features and 413 

community interactions (in other Reddit forums) as exposures that may have been 4i4 

associated with the outcome. 415 

Data Collection and Processing 4 ie 

The publicly available dataset was collected by Reddit user ‘Stuck_In_The_Matrix’ 4i? 
using the official Reddit API. Details are available at https://www.reddit.eom/r/ 4 is 
datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ 419 
Reddit allows posts by small automated programs known as ‘bots’, which can skew 420 

descriptive statistics. To remove account names associated with bots, we first looked at 421 

each poster in a target set of subreddits (including r/conspiracy) and calculated the 422 

number of other subreddits in which they posted (their forum diversity). A list was 423 

compiled of usernames whose forum diversity was more than 15 standard deviations 424 

above the mean. Manual inspection revealed that every member of this list was 425 

probably a bot, whereas more aggressive cuts also included posters who were clearly 426 

human. This was combined with a list of usernames corresponding to known bots 427 

posted on reddit itself (from https: 42s 

//www. reddit. com/r/botwatch/comments/lxojwh/list_of _320_reddit_bots/). 429 

This process identified 466 bots in the set, and these were excluded from subsequent 430 

analyses. This procedure erred on the side of including bots rather than eliminating 431 

human posters. 432 

Identifying study population and control 433 

To construct the control group, we identified users who never posted in r/conspiracy 434 
and had posted at least 4 times in any 6 contiguous 30-day periods. To calculate days, 435 
we converted all times from Coordinated Universal Time (UTC) to Australian Eastern 436 

Standard Time (AEST). To be included in the r/conspiracy group, users must have 437 

posted at least 3 times in r/conspiracy, and have posted at least 4 times in each of 6 438 

contiguous 30-day periods immediately prior to their first post in r/conspiracy. Users 439 
who did not meet either criterion were not examined further. All comments by users in 440 
the control group were included in the subsequent analysis. For the r/conspiracy user 44i 
group, only comments posted before the first post in r/conspiracy were included in the 442 
analysis. 443 

Characterisation of forums 444 

We identified the set of Reddit forums in which r/conspiracy users were 445 

over-represented before they first posted in r/conspiracy and denote these as pathway 446 

subreddits. For each Reddit forum, we looked at the number of target and control 447 

posters who had commented at least once. The degree of over-representation was 448 

defined as the Bayes factor: 449 

r s = P ^ Tar p^ a ^.g ^ eddU ^ , where the subreddit posters were restricted to target and 450 
control groups. We restricted subsequent analysis to subreddits with r s > 4; that is, to 451 
subreddits which had at least 4 times higher proportions of r/conspiracy users than 452 
would be expected from overall site statistics. 453 
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Linguistic features of users 


We used Linguistic Inquiry and Word Count (LIWC) |42,54 , a tool used to extract a 
set of linguistic characteristics from written text corresponding to emotion, personality, 
and language structure. To measure these characteristics, we pre-processed each 
comment to remove escape characters, URLs, and any lines which began with a 
which is typically used to signify text quoted from another author. Comments with 
fewer than 3 words after processing were omitted, and authors with fewer than 3 total 
comments after pre-processing were excluded from further analysis. We then 
concatenated each user’s comments and parsed the concatenated comments with the 
LIWC tool to produce values for a set of 32 linguistic factors for each user. 

As our analysis completely characterizes the population of interest, ordinary 
inferential statistics are a poor guide to meaningful differences. We thus presented effect 
sizes, calculated as exact Common Language Effect Size (CLES) scores 33 . A CLES 


score represents the chance that a randomly chosen member of the target group would 
have a higher score on the relevant measure than randomly chosen member of the 
control group. No difference between groups thus corresponds to a CLES of 0.5. Good 
suggested that the deciban (a difference of 0.1 in log odds) is the smallest meaningful 
unit of evidence 23 26 . As such, we report CLES scores which represent a change of 


more than 1 deciban from even odds. 
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Community Structure Analysis 4 73 

Community structure algorithms are used to identify clusters of well-connected nodes in 474 
a network. Most algorithms aim to identify clusters by maximising the number of 475 

connections within each community compared to the number of connections between 476 
communities. The Blondel et al. [6] implementation of a community structure algorithm mi 
is popular because it is relatively computationally inexpensive and produces useful 478 

communities in large unweighted networks. 479 

We constructed a network of r/conspiracy users where the connection between any 480 
two r/conspiracy users was formed using the proportion of subreddits in which they 48i 
both posted comments divided by the number of subreddits in which either of the two 482 
r/conspiracy users posted comments. For example, two users that only posted in the 483 
same subreddits and no others would have a connection of weight 1, and two users 484 

posting in 10 and 5 subreddits each, with only 2 shared subreddits would have a 485 

connection weight of 0.154 (2 divided by 15-2). We then applied the community 486 

structure algorithm to the network to give each r/conspiracy user a community number. 487 
The algorithm assigns each user to exactly one community. 488 

Community structure algorithms are unsupervised clustering methods, so they do 489 
not use a pre-specified set of labels for separating users into groups by topic and 490 

without support from other methods, cannot be used to summarise what makes a 491 

community different from other communities. To characterise each of the pathway 492 

communities, we firstly labelled each of the subreddits used in the analyses with the 493 
pathway community that was most over-represented by proportion of users, and then 494 
looked for common themes across the set of subreddits assigned to each pathway 495 

community. To determine the over-representation of a pathway community within a 496 
subreddit, we calculated the individual Bayes factor for each community, and selected 497 
the pathway community in which the Bayes factor was the highest. 498 


Study Populations 499 

We identified 15,850 users who met our inclusion criteria for the r/conspiracy group and 500 
1,403,411 users who met our inclusion criteria for the control group. The r/conspiracy 501 
users posted comments in a total of 38,753 subreddits, of which 1,499 had a Bayes factor 502 
of at least 4 and at least 1000 users posting comments in the subreddit (Figure [ST]). 503 
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Figure SI. The number of users, subreddits, and comments included in the analyses. 
From 1.10 billion comments posted in 224,625 subreddits in the dataset, we examined 
in detail 128 million comments posted in the 1,499 subreddits that were pathways to 
posting comments in r/conspiracy. 


Linguistic characteristics of r/conspiracy users versus control 50 4 

In the LIWC analysis, we tested 32 linguistic features, and found that for 22 of them, 505 
there was a clear difference between r/conspiracy users and control users (Table [ST]). soe 

Note that while there were 10 linguistic features in which we found no difference so? 

between the control group and the r/conspiracy group in its entirety, there may have sos 


been individual pathway communities that exhibited substantially higher or lower values 509 
for these LIWC categories (see Figure |S3|). 510 

The CLES values for each of the 93 LIWC categories are presented in Table [ST] In 511 
the table, the 32 linguistic features included in the analyses are highlighted in bold. 512 
Features exhibiting a significant difference from the control group were determined by a 513 
2-tailed t-test with a significance level of 0.05, corrected for multiple comparisons. su 

Community structure and subreddit representation among sis 

r/conspiracy users sie 

Ancillary Dataset 1 includes details of the 1,499 subreddits in which r/conspiracy users si? 


were over-represented; the number of users and comments posted by those users from sis 
the r/conspiracy and control groups; and the pathway community that had the greatest 519 
number of users posting in that subreddit. 520 

Figure |S2| presents a network visualisation of the subreddits in which r/conspiracy 521 
users were over-represented. 522 

Figure [S3] provides full details of the linguistic differences of each of the pathway 523 
communities that make up the r/conspiracy user group. 524 
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Table SI. Comparison of target and control group for each of the 93 LIWC categories. 


Term 

sig 

CLES 

Term 

sig 

CLES 

Term 

sig 

CLES 

wc 

* 

0.60 

negemo 

* 

0.60 

focuspast 

* 

0.48 

Analytic 

n.s. 

0.50 

anx 

* 

0.54 

focuspresent 

* 

0.51 

Clout 

* 

0.58 

anger 

* 

0.61 

focusfuture 

* 

0.46 

Authentic 

* 

0.42 

sad 

n.s. 

0.49 

relativ 

* 

0.44 

Tone 

* 

0.36 

social 

* 

0.58 

motion 

* 

0.45 

WPS 

n.s. 

0.48 

family 

* 

0.54 

space 

n.s. 

0.49 

Sixltr 

* 

0.58 

friend 

n.s. 

0.49 

time 

* 

0.41 

Die 

* 

0.55 

female 

* 

0.54 

work 

* 

0.57 

function 

* 

0.53 

male 

n.s. 

0.52 

leisure 

* 

0.43 

pronoun 

n.s. 

0.50 

cogproc 

* 

0.54 

home 

n.s. 

0.51 

ppron 

* 

0.47 

insight 

* 

0.56 

money 

* 

0.56 

i 

* 

0.42 

cause 

* 

0.56 

relig 

* 

0.61 

we 

* 

0.58 

discrep 

n.s. 

0.50 

death 

* 

0.58 

you 

* 

0.53 

tentat 

* 

0.48 

informal 

* 

0.52 

shehe 

n.s. 

0.52 

certain 

* 

0.58 

swear 

* 

0.59 

they 

* 

0.60 

differ 

n.s. 

0.50 

net speak 

* 

0.47 

ipron 

* 

0.55 

percept 

* 

0.46 

assent 

* 

0.47 

article 

* 

0.53 

see 

* 

0.45 

nonflu 

* 

0.48 

prep 

* 

0.51 

hear 

* 

0.53 

filler 

n.s. 

0.46 

auxverb 

* 

0.54 

feel 

* 

0.42 

AllPunc 

* 

0.52 

adverb 

* 

0.46 

bio 

* 

0.55 

Period 

* 

0.55 

conj 

* 

0.48 

body 

* 

0.53 

Comma 

* 

0.53 

negate 

* 

0.59 

health 

* 

0.58 

Colon 

* 

0.45 

verb 

* 

0.50 

sexual 

* 

0.59 

SemiC 

* 

0.49 

adj 

* 

0.46 

ingest 

n.s. 

0.52 

QMark 

* 

0.54 

compare 

* 

0.47 

drives 

* 

0.51 

Exclam 

* 

0.46 

interrog 

* 

0.58 

affiliation 

* 

0.48 

Dash 

n.s. 

0.49 

number 

* 

0.42 

achieve 

* 

0.41 

Quote 

* 

0.55 

quant 

* 

0.48 

power 

* 

0.61 

Apostro 

* 

0.48 

affect 

* 

0.49 

reward 

* 

0.39 

Parent h 

* 

0.48 

posemo 

* 

0.41 

risk 

* 

0.58 

OtherP 

n.s. 

0.48 


CLES: Common Language Effect Size; sig: significance; n.s. not 
significant. 
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Figure S2. A network visualisation of the 1,499 subreddits in which r/conspiracy 
users were over-represented. The network is a weighted and undirected network in 
which the weight of the connections is defined by the proportion of shared r/conspiracy 
users (r/conspiracy users who posted in both subreddits) divided by the proportion of 
r/conspiracy users who did not post in both subreddits. In the visualisation, the areas 
of the nodes are proportional to the number of r/conspiracy users who posted in them, 
the colours (from cyan to red) represent increasing Bayes factors (from 4.00 to 89.47), 
and the width of the connections represents the weights of the links (given by shared 
r/conspiracy users as defined above). Nodes are positioned via a heuristic (Force Atlas 2 
in Gephi [5]) such that well-connected clusters of nodes are positioned close together. 
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Figure S3. Linguistic differences of each of the pathway communities that make up 
the r/conspiracy user group. 
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